Let’s load the tidyverse package.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.1.2
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.1.2
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ tibble 3.1.8 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ✔ purrr 0.3.4
## Warning: package 'tibble' was built under R version 4.1.2
## Warning: package 'tidyr' was built under R version 4.1.2
## Warning: package 'readr' was built under R version 4.1.2
## Warning: package 'dplyr' was built under R version 4.1.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
Let’s import our data using read_csv.
data <- read.csv("data/chds6162_data.csv")
We use geom_point to make a scatterplot. Let’s make a
scatterplot that shows age on the x axis and height on the y axis.
ggplot(data = data,
mapping = aes(x = age,
y = ht)) +
geom_point()
## Warning: Removed 23 rows containing missing values (geom_point).
#another way you may see this
ggplot(data,aes(age,ht)) + geom_point()
## Warning: Removed 23 rows containing missing values (geom_point).

We use geom_histogram to make a histogram. Let’s make a
histogram of age.
ggplot(data = data,
mapping = aes(x = age)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 2 rows containing non-finite values (stat_bin).
How does ggplot know what to plot on the y axis? It’s using the
default statistical transformation for geom_histogram,
which is stat = "bin".

If we add stat = "bin" we get the same thing. Each geom
has a default stat.
ggplot(data = data,
mapping = aes(x = age)) +
geom_histogram(stat = "bin")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 2 rows containing non-finite values (stat_bin).
#shorter way to do write it:
ggplot(data,aes(age)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 2 rows containing non-finite values (stat_bin).
We can adjust the number of bins using the bins
argument.
ggplot(data = data,
mapping = aes(x = age)) +
geom_histogram(bins = 10)
## Warning: Removed 2 rows containing non-finite values (stat_bin).
There are two basic approaches to making bar charts, both of which
use geom_bar.
Approach #1
Use your full dataset.
Only assign a variable to the x axis.
Let ggplot use the default stat transformation
(stat = "count") to generate counts that it then plots on
the y axis.
Approach #2
Wrangle your data frame before plotting, possibly creating a new data frame in the process
Assign variables to the x and y axes
Use stat = "identity" to tell ggplot to use the data
exactly as it is
Let’s make a bar chart that shows height.
ggplot(data = data,
mapping = aes(x = age)) +
geom_bar()
## Warning: Removed 2 rows containing non-finite values (stat_count).
The default statistical transformation for geom_bar is
count. This will give us the same result as our previous
plot for histograms.
ggplot(data = data,
mapping = aes(x = age)) +
geom_bar(stat = "count")
## Warning: Removed 2 rows containing non-finite values (stat_count).
#or
ggplot(data, aes(age)) + geom_bar()
## Warning: Removed 2 rows containing non-finite values (stat_count).
Here’s what’s going on.

It’s often easier to do our analysis work, save a data frame, and then use this to plot.
Let’create a dataframe of gestation lenght (this time in weeks) by mother smoking habits.
gestation_by_smoke <- data %>%
mutate(gestation_w = gestation/7,
smoke = case_when(
smoke == 1 ~ "smokes now",
smoke == 2 ~ "until now",
smoke == 3 ~ "once did",
smoke == 0 ~ "never")) %>%
group_by(smoke) %>%
summarize(gestation_w = mean(gestation_w,na.rm = TRUE)) %>%
drop_na(smoke)
Then let’s use this data frame to make a bar chart. The
stat = "identity" here tells ggplot to use the exact data
points without any stat transformations.
ggplot(data = gestation_by_smoke,
mapping = aes(x = smoke,
y = gestation_w)) +
geom_bar(stat = "identity")
color and fillWe add the color argument within the aes so
that the data in that variable is mapped to those aesthetic
properties.
Let’s add different colors for males and males to our previous scatterplot.
data <- data %>%
mutate(smoke_lbl = case_when(
smoke == 1 ~ "smokes now",
smoke == 2 ~ "until now",
smoke == 3 ~ "once did",
marital == 0 ~ "never"))
ggplot(data = data,
mapping = aes(x = age,
y = ht,
color = smoke_lbl)) +
geom_point()
## Warning: Removed 23 rows containing missing values (geom_point).
#what if our "color"variable is continues rather than labels
ggplot(data,aes(age,ht,color = smoke)) + geom_point()
## Warning: Removed 23 rows containing missing values (geom_point).
Let’s try the same thing with our last bar chart (gestation_by_smoke.
ggplot(data = gestation_by_smoke,
mapping = aes(x = smoke,
y = gestation_w,
color = smoke)) +
geom_bar(stat = "identity")
That didn’t work! Let’s try fill instead.
ggplot(data = gestation_by_smoke,
mapping = aes(x = smoke,
y = gestation_w,
fill = smoke)) +
geom_bar(stat = "identity")
We can change which colors the data is mapped to by using a
scale_ function.
Let’s use a built-in palette like scale_color_viridis_d
(d = discrete data).*
*FYI: The viridis scales provide colour maps that are perceptually uniform in both colour and black-and-white. They are also designed to be perceived by viewers with common forms of colour blindness. The package contains 4 color scales: viridis, magma, plasma, and infermo.
ggplot(data = data,
mapping = aes(x = age,
y = ht,
color = smoke_lbl)) +
geom_point() +
scale_color_viridis_d(option = "plasma")
## Warning: Removed 568 rows containing missing values (geom_point).
# shorter version
ggplot(data,mapping = aes(age,ht,color = smoke_lbl)) + geom_point() + scale_color_viridis_d(option = "plasma")
## Warning: Removed 568 rows containing missing values (geom_point).
To add labels to our plot, we use labs. let’s add a
title argument to the last scatterplot.
ggplot(data,mapping = aes(age,ht,color = smoke_lbl)) +
geom_point() +
scale_color_viridis_d(option = "plasma") +
labs(title = "Mother's age and height by smoking habits")
## Warning: Removed 568 rows containing missing values (geom_point).
We can add a subtitle as well.
ggplot(data,mapping = aes(age,ht,color = smoke_lbl)) +
geom_point() + scale_color_viridis_d(option = "plasma") +
labs(title = "Mother's age and height by smoking habits",
subtitle = "Data from the Child Health and Development Studies 1961 and 1962")
## Warning: Removed 568 rows containing missing values (geom_point).
We can change the x and y axis labels using the x and
y arguments.
ggplot(data,mapping = aes(age,ht,color = smoke_lbl)) +
geom_point() + scale_color_viridis_d(option = "plasma") +
labs(title = "Mother's age and height by smoking habits",
subtitle = "Data from the Child Health and Development Studies 1961 and 1962",
x = "Age",
y = "Height (inches)",
color = "Smoking habits")
## Warning: Removed 568 rows containing missing values (geom_point).
To add a theme to a plot, we use the theme_ set of
functions. There are several built-in themes. For instance,
theme_minimal.
ggplot(data,mapping = aes(age,ht,color = smoke_lbl)) +
geom_point() +
scale_color_viridis_d(option = "plasma") +
labs(title = "Mother's age and height by smoking habits",
subtitle = "Data from the Child Health and Development Studies 1961 and 1962",
x = "Age",
y = "Height (inches)", color = "Smoking habits") +
theme_minimal()
## Warning: Removed 568 rows containing missing values (geom_point).
There are also packages that give you themes you can apply to your plots.
ggthemes package
library(ggthemes)
#?ggthemes
We can then use a theme from this package
(theme_excel_new) to make our plots look like those in the
new version of Excel.
ggplot(data,mapping = aes(age,ht,color = smoke_lbl)) +
geom_point() + scale_color_viridis_d(option = "plasma") +
labs(title = "Mother's age and height by smoking habits",
subtitle = "Data from the Child Health and Development Studies 1961 and 1962",
x = "Age",
y = "Height (inches)",
color = "Smoking habits") +
theme_excel_new()
## Warning: Removed 568 rows containing missing values (geom_point).
#what about APA?
library(jtools)
## Warning: package 'jtools' was built under R version 4.1.2
ggplot(data,mapping = aes(age,ht,color = smoke_lbl)) +
geom_point() + scale_color_viridis_d(option = "plasma") +
labs(title = "Mother's age and height by smoking habits",
subtitle = "Data from the Child Health and Development Studies 1961 and 1962",
x = "Age",
y = "Height (inches)",
color = "Smoking habits") +
theme_apa()
## Warning: Removed 568 rows containing missing values (geom_point).
You can make small multiples by adding just a line of code using the
facet_wrap function. Let’s make separate plot for all the
labels in the smoking variable
ggplot(data,mapping = aes(age,ht,color = smoke_lbl)) +
geom_point() + scale_color_viridis_d(option = "plasma") +
labs(title = "Mother's age and height by smoking habits",
subtitle = "Data from the Child Health and Development Studies 1961 and 1962",
x = "Age",
y = "Height (inches)",
color = "Smoking habits") +
theme_apa() +
facet_wrap(~smoke_lbl)
## Warning: Removed 568 rows containing missing values (geom_point).
We can do this for any type of figure. let’s make multiple histograms for age by smoking habits
ggplot(data = data,
mapping = aes(x = age)) +
geom_histogram() +
theme_apa() +
facet_wrap(~smoke_lbl)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 2 rows containing non-finite values (stat_bin).
Another example:
ggplot(data = data,
mapping = aes(x = age,
y = ht,
color = smoke_lbl)) +
geom_point() +
scale_color_viridis_d(option = "magma") +
labs(title = "Association Between Age and Height",
subtitle = "Data from the Child Health and Development Studies 1961 and 1962",
x = "Age",
y = "Height (inches)",
color = "Smoking Habits") +
theme_economist() +
facet_wrap(~ed)
## Warning: Removed 568 rows containing missing values (geom_point).
RMarkdown: just knit your file and your plots will show up as part of your HTML, Word, or PDF document.
just by itself: use the ggsave function. By default,
ggsave will save the last plot you made.So you can add it
to each of the graphs you want to save.
ggplot(data,mapping = aes(age,ht,color = smoke_lbl)) +
geom_point() + scale_color_viridis_d(option = "plasma") +
labs(title = "Mother's age and height by smoking habits",
subtitle = "Data from the Child Health and Development Studies 1961 and 1962",
x = "Age",
y = "Height (inches)",
color = "Smoking habits") +
theme_apa() +
facet_wrap(~smoke_lbl)
## Warning: Removed 568 rows containing missing values (geom_point).
ggsave("plots/plot_example.png")
## Saving 7 x 5 in image
## Warning: Removed 568 rows containing missing values (geom_point).
We can save our plot to other formats as well. PDF is a great option.
```r
ggsave("plots/example.pdf")
## Saving 7 x 5 in image
## Warning: Removed 568 rows containing missing values (geom_point).